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Abstract 

The transfer of organelle DNA fragments to the nuclear genome is frequently observed in eukaryotes. These 
transfers are thought to play an important role in gene and genome evolution of eukaryotes. In plants, such 
transfers occur from plastid to nuclear [nuclear plastid DNAs (NUPTs)] and mitochondrial to nuclear 
(nuclear mitochondrial DNAs) genomes. The amount and genomic organization of organelle DNA fragments 
have been studied in model plant species, such as Arabidopsis thaliana and rice. At present, publicly available 
genomic data can be used to conduct such studies in non-model plants. In this study, we analysed the amount 
and genomic organization of NUPTs in 1 7 plant species for which genome sequences are available. The 
amount and distribution of NUPTs varied among the species. We also estimated the distribution of NUPTs 
according to the time of integration (relative age) by conducting sequence similarity analysis between 
NUPTs and the plastid genome. The age distributions suggested that the present genomic constitutions of 
NUPTs could be explained by the combination of the rapidly eliminated deleterious parts and few but con- 
stantly existing less deleterious parts. 
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1. Introduction 

In general, two cytoplasmic organelles in plants 
havetheirown genomes: chloroplast and mitochondria. 
These organelles have been thought to have ac- 
quired these genomes via endosymbiotic mechanism 
during the early evolution of eukaryotic species. 1,2 
Chloroplasts are considered to have originated from 
cyanobacteria, although their genome size (around 
1 50 kb) and gene number (about 1 00 protein-coding 
genes) are remarkably reduced compared with those 
of cyanobacteria (genome size, around 7 Mb; gene 
number: more than 5000 protein-coding genes). 3,4 
The major parts of the chloroplast genome were simply 
eliminated since they were redundant with the nuclear 
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genes. Furthermore,some other parts of the cytoplasmic 
genome had been transferred to the nuclear genome, 
and the original ones were lost. 3,5 The transfer of DNA 
fragments occurred between three genomes, although 
the direction of transfer was not uniform. Transfer 
from the cytoplasmic-to-nuclear genome is consider- 
ably higher than the opposite, and transfer between 
organelle genomes is considered to be rare. 6,7 The 
pattern of chloroplast DNA integration in the nuclear 
genome (so-called nuclear plastid DNA: NUPT) has 
been investigated in several species, and the mechanism 
of their integration and genomicorganizations has been 
analysed in detail. 7,8 

The transfer of DNA fragments from the chloroplast - 
to-nucleargenome isstill active. 9-1 1 The actual transfer 
rate was estimated to be about one per 1 6,000 pollen 
grains (about 6 x 1 0~ 5 per pollen grain) 9 or one per 
5 x 1 0 6 cells (2 x 1 0~ 7 per cell) 11 by measuring 
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functional gene integration into the nuclear genome. 
When the length and functional ability of genes were 
considered, the total transfer rate of DNA fragments 
from the chloroplast-to-nuclear genome would be 
higher than that estimated by direct experiments. 9,1 1,1 2 
The transferof chloroplast fragments occurred via both 
simple and complex structural organizations. 10 These 
fragments form continuous, rearranged, inter-chromo- 
somal rearranged, and mosaicstructured patterns in the 
nuclear genome. 6 NUPTs also tend to be located close to 
each other, 1 3 suggestingsimultaneous integration and/ 
or biased integration preferences. 14 Although the inte- 
gration mechanism of de novo NUPTs is still not com- 
pletely elucidated, non-homologous recombination 
and/or non-homologous end joining of double-strand 
break re pair are suggested to be the integration mechan- 
ism as any other extra-nuclear genome DNAs. 1 5,1 6 The 
integration mainly occurs during male gametogenesis 1 7 
and is increased byenvironmentalstresses. 1 8,19 Afterthe 
chloroplast DNA fragments became integrated into the 
nuclear genome, newly formed NUPTs are sometimes 
unstable and are lost rapidly. 20 Even though some 
NUPTs escape from the early unstable stage, they are 
fragmented and eliminated from the genome. 1 3 > 14 ' 21 

Although the integration mechanisms, genomicorgan- 
ization, and evolution of NUPTs have been analysed in 
detail, most studies were performed using Oryza sativa 
and Arabidopsis thaliana. 3 '^ 4 ' 2 ^~ 23 Despite frequent 
transfer and loss of genomic regions during evolution, 
the size and structure of the chloroplast genome is con- 
served among species, facilitating the elucidation of the 
general trend of DNA fragment transfer. Furthermore, 
because of its simple structure, chloroplast genome 
sequences have been determined from much more 
species, unlike mitochondrial genomes. In recent years, 
whole-genome sequences of >20 phylogenetically 
diverse plants have been published. Presently, estimating 
the genomic organization of NUPTs in these plant 
species has become possible, facilitating an understand- 
ing of the general rules of structural patterns and evolu- 
tionary history of NUPTs. Here, we report the amounts 
and structures of NUPTs from 1 7 plant species and 
discuss the general patterns that underlie the acquisition, 
maintenance, and elimination of nuclear-localized 
chloroplast DNA fragments. 

2. Materials and Methods 

2.1. Data used 

Of the species whose whole-genome sequences have 
been reported, 7 with sequences for all 3 genomes 
(nuclear, chloroplast, and mitochondrial) and 1 0 with 
nuclear and chloroplast genome sequences were used 
in this study. Because there are some similarities 



between chloroplast and mitochondrial genomes, the 
former seve n species were a na lysed i n deta i I to est i mate 
the pseudo-positive detection rate of NUPTs that were 
possibly originated from the mitochondrial genome. 
These seven species were as follows: A. thaliana, 24,25 
Carica papaya 26 Vitis vinifera 27,28 Lotus japonicus, 29 ' 30 
O. sativa, 3 ' 1 ' 32 Sorghum bicolor, 33,34 and lea mays. 35,36 
The species with the data for nuclear and chloroplast 
genomes were as follows: Medicago truncatula, 37 
Glycine max, 38 ' 39 Manihot esculenta, 40,4 ^ Ricinus com- 
munis, 42 ' 43 Populus trichocarpa, 44 Cucumis sativus, 45,46 
Fragaria vesca, 47 Solanum lycopersicum 48,49 Solanum 
tuberosum, 50,5 ^ and Brachypodium distachyon. 52,53 

2.2. Identification of NUPTs 

The organelle genome sequences of each species 
were used as query to conduct BLAST search against 
nuclear genome sequences available in July 201 2. In 
many cases, NCBI BLAST server was used; and bulk 
data downloaded from Phytozome ver 8.0 54 or 
species-specific websites (L. japonicus, F. vesca, S. tubero- 
sum, and S. lycopersicum) were also used to conduct 
local BLAST searches. The sequence dataset used in 
this study is listed in Supplementary Tables S1 and S2. 
In this study, only BLAST hits with 1 00 bp and longer 
in length and 90% and more identity to the chloroplast 
genome sequences were used for the following ana- 
lyses. Because NUPTs were present as fragmented and 
mosaic structures, some sequences detected by BLAST 
search were very short. BLAST scores tend to be low 
for short-length homologous sequences; therefore, we 
used sequence identities as cut-off criteria. These cri- 
teria might have limited our analysis to only relatively 
recent transfer events. 

BLAST hits forthe NUPTsfound within inverted repeat 
regions of the chloroplast genome were obtained in 
both inverted repeat regions and could not be distin- 
guished. These BLAST hits were counted only once. 
Some BLAST hits were found at the edge of the inverted 
repeat region and were also present on theothersideof 
the inverted repeat over a single copy region. These 
sequences were counted in the latter position only. 

2.3. Identification of regions of NUPTorigin on the 
chloroplast genome 

The chloroplast genomic region from where NUPTs 
originated was identified by assigning each identified 
NUPT to a chloroplast sequence. If NUPTs existed within 
inverted repeat regions, they were simply counted as 
half. After BLAST hits were assigned, the number of 
NUPTs was counted on each nucleotide site of the chloro- 
plast genome. The differences in the presence of NUPTs 
between inverted repeat regions and single copy 
regions were tested by the Welch two-sample t-test. 
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2.4. Estimation of age distribution 

For each identified NUPT, the sequence identity to 
chloroplast genome was calculated. Although plant or- 
ganelle genome is known to have a low- (1 j\ Oth) muta- 
tion rate compared with the nuclear genome, 55-57 the 
exact nuclear/chloroplast mutation rate ratio is still 
unclear. Furthermore, because mutation directions 
differ between nucleotides in NUPTs, 58,59 simple correc- 
tion methods of genetic distances cannot be applied. 
Thus, p-distance was used to estimate the time of 
NUPT integration event (relative age). By using p-dis- 
tances, the estimated age should represent relative age 
but not reflect true integration time. If natural selection 
affected evolution of the NUPTs, the substitution rate 
should vary especially if mutations were advantageous. 

2.5. Characterization of flanking regions of NUPTs on 
the chromosomes 

Inthespeciesanalysed inthisstudy,detailed informa- 
tion about transposons was obtained for A. thaliana,0. 
sativa, and Z. mays from Repbase. 60 For these species, 
the number of transposable elements (TEs) in the 
vicinity of NUPTs was estimated using the program 
RepeatMasker ver.4.0.1 (available from: http://www. 
repeatmasker.org/) to investigate the features of the 
integrated regions of NUPTs. In all, 5-kb sequences 
from both 5' and 3' flanking regions of NUPTs were 
extracted. The number of each type of TE within 
the regions was estimated using RepeatMasker in the 
default mode. For comparison, the number of TEs 
within randomly extracted regions from genomes was 
estimated to compare with those found in the vicinities 
of NUPTs. 

Some genomes retained NUPTs with substantial 
length and relatively low identities (i.e. predicted to be 
long-lived after insertion). In this study, NUPTs more 
than 5 kb in length with identities from 90 to 96% 
were considered as long-lived NUPTs. Flanking 5-kb 
regions were surveyed to analyse the surrounding 



region of long-lived NUPTs. Such long-lived NUPTs 
tended to be surrounded by other NUPTs or repeat 
sequences. First, the presence of other NUPTs surround- 
ing long-lived NUPTs was verified. When long-lived 
NUPTs formed clusters with other NUPTs, the outer 
regions of such NUPTclusters were analysed as flanking 
regions of long-lived NUPTs. The occurrence of repeat 
sequences in the flanking region of long-lived NUPTs 
was analysed. Flanking 5-kb regions were used as 
query to conduct BLAST search against each nuclear 
genome sequence. If the entire or partial sequences in 
the 5-kb flanking regions were found >10 times in 
the whole genome, such sequences were tentatively 
defined as repeat sequences. The number of such 
repeat sequences in the flanking region was estimated. 



3. Results and Discussion 

3. 7. Number and amount of NUPTs in plantspecies 

The identified nuclear genome fragments similar to 
the chloroplast genome included several fragments 
that were also similar to the mitochondrial genome. 
Thus, we first estimated a proportion of sequences 
similar to both chloroplast and mitochondrial genomes 
(Table 1). Then, we estimated a proportion of NUPTs 
that are also similar to the mitochondrial genome. The 
results indicated that the values depended on the 
similarity between chloroplast and mitochondrial 
genome sequences. The proportion of indistinguishable 
sequences varied from 0.2 (L.japonicus)to38A %(V. vini- 
fera) that were similar to that of sequences similar 
between chloroplast and mitochondrial genomes. The 
exceptions were A. thaliana and O. sativa. Arabidopsis 
thaliana had almost whole mitochondrial genome inte- 
gration on Chromosome 2 61 that caused inflation of the 
proportion of indistinguishable sequences. The reason 
for the low proportion of indistinguishable sequences 
in O. sativa was not clear. The results in the following 



Table 1. Summary of homologous regions between chloroplast and mitochondria genomes 



Species Genome size (kb) Homologous regions between NUPTs 

chloroplast and mitochondria 

Chloroplast Mitochondria Number Length (bp) Proportion to Length (bp) Also similar to mitochondrial 

chloroplast genome 
genome (%) Length (bp) Proportion (%) 



Arabidopsis thaliana 


1 54.5 


366.9 


8 


3298 


2.1 


1 7 658 


31 67 


17.9 


Carica papaya 


1 60.1 


478.9 


8 


1 9 798 


1 2.4 


269 824 


53 588 


1 9.9 


Vitis vinifera 


1 60.9 


773.3 


36 


63 033 


39.2 


337 71 1 


1 28 698 


38.1 


Lotus japonicus 


1 50.5 


380.9 


7 


2687 


1 .8 


147 286 


239 


0.2 


Oryza sativa 


1 34.6 


490.5 


1 5 


24 770 


1 8.4 


846 607 


42 326 


5.0 


Sorghum bicolor 


140.8 


468.6 


1 6 


24 328 


1 7.3 


1 69 352 


27 01 3 


1 6.0 


Zea mays 


1 40.4 


569.6 


10 


21 729 


1 5.5 


1 006 782 


160 965 


1 5.5 
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Species Genome size (Mb) a NUPTs Proportion to nuclear genome (%) 





N uclea r 


C h I o ro p 1 3 st 


Number 


Length (kb) 




A. thaliana 


119 


0.1 5 


38 (31) 


1 7.7 (14.5) 


0.01 5 (0.01 2) 


C. papaya 


343 


0.1 6 


61 3 (486) 


269.8 (21 6.2) 


0.079 (0.063) 


V. vinifera 


486 


0.1 6 


900 (497) 


337.7 (209.0) 


0.069 (0.043) 


L. japonicus 


301 


0.1 5 


394 (392) 


147.3 (147.0) 


0.049 (0.049) 


M. truncatula 


567 


0.1 2 


361 


477.8 


0.084 


G. max 


974 


0.1 5 


1435 


406.3 


0.042 


M. esculenta 


533 


0.1 6 


199 


54.4 


0.01 0 


R. communis 


107 


0.1 6 


632 


264.2 


0.247 


P. trichocarpa 


481 


0.1 6 


293 


241 .8 


0.050 


C. sativus 


203 


0.1 5 


1 69 


49.0 


0.054 


F. vesca 


195 


0.1 6 


218 


58.2 


0.030 


S. tuberosum 


727 


0.1 6 


563 


429.6 


0.059 


S. lycopersicum 


782 


0.1 6 


1 513 


674.4 


0.084 


B. distachyon 


271 


0.14 


863 


531.5 


0.1 96 


O. sativa 


382 


0.1 3 


61 1 (495) 


846.6 (804.3) 


0.222 (0.21 0) 


S.bicolor 


697 


0.1 4 


51 5 (41 7) 


1 69.4 (142.3) 


0.024 (0.020) 


Z. mays 


2066 


0.14 


1 459 (1 099) 


1 041.3 (880.4) 


0.050 (0.043) 



In parentheses, the estimated values are shown when mitochondria-chloroplast transferred DNAs were excluded. 
Accumulative length of determined whole-genome sequences in database. 



analyses, however, were similar even if these indistin- 
guishable fragments were included, suggesting similar 
evolutionary dynamics between NUPTs and nuclear 
mitochondrial DNAs(NUMTs) in a species. Thus, the am- 
biguous sequences were not excluded while performing 
the other analyses. 

The number and amount of NUPTs varied among 
species (Table 2). In all species, most identified NUPTs 
were short fragments (median ranged from 1 75 bp for 
L. japonicus to 51 4 bp for P. trichocarpa), and many of 
them are less than 200 bp (25% in P. trichocarpa to 
65% in L. japonicas). Arabidopsis thaliana had only 38 
NUPTs, and the total length of the NUPTs was about 
1 8 kb. On the other hands, Z. mays had about 1 500 
NUPTs, and the total length was nearly 1 Mb. 
Previously, species with large genome sizes were shown 
to contain large amounts of NUPTs than those with 
small genome sizes. 62 Similarly, in our study, a positive 
correlation existed between genome size and cumulative 
length of NUPTs (P< 0.01 by Kendall's rank correlation; 
Supplementary Fig. S1). However, the proportions of 
NUPTs in the nuclear genome were not constant 
among species. The lowest proportion was found in A/I. 
esculenta, which had only 1 0~ 4 of the nuclear genome 
showing similarity to its chloroplast genome. On the 
other hand, the monocot species O. sativa and dicot 
species R. communis had >0.2% nuclear genome 
similar to the chloroplast genome. The amount of 
NUPTs differed even between closely related species. In 



Gramineae species, the proportion of NUPTs varied 
from 0.024 (S. bicolor) to 0.222% (O. sativa). These dif- 
ferences might reflect differences in not only genome 
size but also genome complexity and proportion of re- 
petitive elements and/or other factors. Because the as- 
sembly and annotation of the genome in most species 
has not yet been completed, additional NUPTs might 
be found in centromeres and chromosome knobs 
where the high amount of NUPTs would be expected. 

3.2. Distribution of NUPTs according to their origin 
from the chloroplast genome 
The structural differences among chloroplast 
genome regions could cause variation in the transfer 
rate to the nuclear genome. Thus, the incidence of 
transfer would not be uniform throughout the chloro- 
plast genome. However, no regions had a extremely 
low or high amount of NUPTs in all species (Fig. 1 ), al- 
though there were significant differences in NUPT 
origin throughout the chloroplast genome. The 
absence of obvious hot or cold spots (regions) of 
NUPT origin suggests that no chloroplast genome 
regions have a strong deleterious effect on the host 
genome when transferred to the nuclear genome. 
Moreover, no resistance to transfer/integration 
mechanisms existed in the local chloroplast genomic 
regions. Taken together with the presence of long 
NUPTs, the fact that NUPTs frequently originated from 
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Figure 1 . Distribution of NU PT sequences on the chloroplast genome. The a mounts of NUPTorigins on each chloroplast genome are shown for 
all 1 7 species. Numbers of NUPTs are plotted for each base pair of the chloroplast genome. For seven species with assembled mitochondrial 
genomes, NUPTs from the homologous regions between the chloroplast and mitochondrial genomes are shown by light gray bars, whereas 
other NUPTorigins are shown by black bars. For the other 1 0 species, NUPTorigins are shown by gray bars. Shaded boxes represent inverted 
repeat regions. Dark gray lines in the x-axis represent chloroplast genomes, where homologous regions between chloroplast and 
mitochondria are shown in white. 



not only genie regions but also intergenic regions of 
chloroplast genomes suggests that the integration pre- 
dominantly occurred through DNA molecules as previ- 
ously suggested. 63,64 

Matsuo et al. 2y suggested that the high amount of 
NUPTs were present in the chloroplast-mitochondria 
homologous regions in rice. We also confirmed this 
biased origin in all species in which all three genome 
sequences were available (Supplementary Table S3). 
These results are reasonable because NUMTs cannot 
be distinguished from NUPTs in these regions, leading 
to the inflation of estimated NUPT numbers. 



3.3. Biased distribution of NUPTs in the nuclear genome 
The amount of NUPTs also varied among nuclear 
chromosomal regions in each species (Fig. 2). Most 
species had regions with extremely large amount of 
NUPTs. In a 1-Mb region, the expected amount of 



NUPTs ranged from 1 00 bp to 2.5 kb in each species, 
but there were many regions that contained more than 
5-kb NUPTs. Because NUPTs are known to be clustered 
in a specific region, 21,65 several regions would be rich in 
NUPTs. NUPTsclose to the centromere regions we re sug- 
gested to be longer and younger in O. sativaP 
Michalovova et al. showed that the pericentromere- 
biased distribution of NUPTs was observed especially in 
species with a small genome size, whereas species with 
large genome sizes showed a wide distribution of 
NUPTs. We also observed that the distribution of NUPTs 
was not always similar among species (Fig. 2). In species 
in which centromere locations were known, high 
amount of NUPTs was detected not only in the regions 
close to the centromere, but also in the distal regions of 
the chromosomes, even though such distal regions 
often contain gene-rich regions. This finding might 
imply the difference in the tolerability of the amount of 
NUPTs in the distal regions across species. 
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Figure 2. Position of NUPTs on the chromosomes. Locations and a mounts of NUPTs a re shown for six species (A. thaliana, L. japonicus, V. vinifera, 
O. sativa, S. bicolor, andZ. mays). Vertical lines and empty diamonds represent chromosomes and their centromeres. Nuclear genome was 
split into 1 -Mb regions, and the amount of NUPTs in each region was plotted across a horizontal bar. 



The high amount of NUPTs in specific chromosomal 
regions could be related to the presence of TEs. 
Because NUPTs are non-functional elements like TEs, 
the pressures of their degradation and elimination 
would be high in gene-rich regions, where foreign 
DNA insertions cause disruption and/or interruption 
of accurate gene function and regulation. Actually, the 
regions surrounding NUPTs were rich in TEs in all the 
three analysed species (Fig. 3). Arabidopsis thaliana 
and Z. mays, but not O. sativa, had significantly a 
higher amount of TEs around NUPTs, suggesting co- 
localization or similar genomic organization of NUPTs 
and TEs. Our findings imply the similarities between 
NUPTs and TEs that are considered as 'junk' DNA and 
show restricted distributions in the host genomes. 

3.4. Age distribution of NUPTs 

There were two distinct patterns of NUPTage distri- 
bution in the analysed plant species (Supplementary 
Fig. S2). One was typical for A. thaliana and most other 
dicot species, where no clear biases were observed. The 
other was found in all Gramineae species and few dicot 
species, showing very high proportion of young 




A. thaliana (15) O. sativa ( 1 63) Z mays (377) 

** p<0.01 _ NUPTs 
***p<0.001 I I randomly extracted region 

Figure 3. Types of TEs in the vicinity of NUPTs. The average number of 
TEs in the vicinity of NUPTs is shown. The 5-kb sequences from 
both 5' and 3' flanking regions were extracted to analyse the 
number of TEs. TEs in sequences were estimated using the 
program RepeatMasker (available from: http://repeatmasker. 
org). For comparison, the average number of TEs within 
randomly extracted regions from each genome was also 
estimated. The figures in parenthesis show the number of NUPTs 
analysed (with identity from 96 to 90%, >1 00 bp). 
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fragments that decreased dramatically with time. 
Although these two patterns were completely different, 
closely related species occasion ally showed different pat- 
terns from each other. In Fabaceae,G. maxhad no biased 
age d istri bution of N U PTs, whereas M. truncatu I a showed 
a high peak at the youngest age. Although age distribu- 
tion patterns differed between species, when the same 
scale of NUPT amount was used (Fig. 4), the two typical 
patterns could be explained by the combination of two 
different categories: the exponentially reduced part 
and a low constantly existing part. The distributions of 
Gramineae species and few dicot species {P. trichocarpa 
and M. truncatula) seem to be composed of these two 
categories, whereas those of other dicots would have a 
small portion of exponentially reduced part. These two 
categories might represent the difference of selective 
pressure against NUPTs in each category. We discussed 
this hypothesis later. 

In addition to the elimination of individual NUPTs, 
their length pervasively changed from that found in the 
original chloroplast regions (Fig. 5). Differences in NUPT 
length correlated with NUPT age, indicating time- 
dependent accumulation of indel variations. The cumu- 
lative length differences in age classes were mostly nega- 
tive, suggesting NUPT lengths became shorter after 
integration into the nuclear genome (Supplementary 
Fig. S3). These results suggest that the degradation of 
NUPTs occurred by not only complete elimination of in- 
dividual NUPTs, but also by small indel variations. 

Several NUPTs are known to be located in the same 
region with fragmentations and inversions. 14,21,65 
This phenomenon could have originated during the in- 
sertion process and create short-fragmented NUPTs 
around long NUPTs. Long NUPTs might have been 
degraded to short fragments or eliminated from the 
genome. This pattern was observed in many analysed 
species, especially in the species having numerous 
young-aged NUPTs (Fig. 6). In such species, some 
NUPTs were long as several tens of kb in size, but most 
of them had <4% p-distances to the chloroplast 
genome. This result suggests that the effective degrad- 
ation and elimination occurred in all species, and that 
only small NUPTs can escape eliminations. It is of inter- 
est to know why some relatively long NUPTs (>5 kb) 
still exist without degradations. Such long-lived NUPTs 
might have been assigned new functions as genes or 
regulatory elements to be maintained under negative 
selection. The old NUPTs (>4% p-distances to the 
chloroplast genome) rarely maintained their lengths 
longer than 5 kb. Such long, old NUPTs were sur- 
rounded by short NUPTs (Table 3). Moreover, a high 
amount of repetitive sequences were located around 
the cluster of such NUPTs. These observations suggest 
that the long-lived NUPTs are relic of large NUPTs 
degraded to become fragmented, but not became func- 
tionary important. 



3.5. Patterns of maintenance and elimination 
of NUPTs during evolution 
The presence of NUPTs in the nuclear genome should 
be determined by the balance between the transfer and 
elimination rates during evolution. The transfer rate 
from the chloroplast-to-nuclear genome was estimated 
to be about 6 x 1 0~ 5 per pollen grain 9 or 2 x 1 0~ 7 
per cell 1 1 by a direct experiment involving transgenic 
tobacco. These estimated values by direct experiments 
should be underestimate for the chloroplast genome 
transfer rate of any regions, 9,1 1,12 because, in these 
experiments, antibiotic-resistant genes of about 1 - 
1.5 kb were used and whole genie region, including 
promoter sequences, should be integrated in any 
chromosomal region where the integrated genes can 
become actively transcribed. Indeed, actual integrated 
DNA fragments were 6.0 to over 22.3 kb in length, 10 
suggesting chloroplast genome fragment integrations 
occurred by long DNA fragments. Such long-length 
NUPTs were very rare even in recently integrated 
sequences (0-6%; Supplementary Table S4). Although 
the direct experiment studies were carried by using 
tobacco alone, the actual transfer rate is valuable to con- 
siderthe dynamics of NUPTs in plant genomes. Byapply- 
ing the values obtained in this study, we could roughly 
estimate the transfer rate of NUPTs in each plant 
species used in this study. Considering that all NUPTs 
are selectively neutral and using estimated transfer rate 
per generation, we found that the estimated number of 
NUPTs with <1 % p-distances varied from 3 x 1 0 4 (O. 
sativa) to 2 x 1 0 6 (V. vinifera) (assuming neutral muta- 
tion rate per generation as 1 0~ 8 ). Even if the proportion 
of NUPTs longer than 1 kb was used for estimation, the 
values exceed over 3 x 1 0 4 . Each estimated value was at 
least 1 00-times larger than the observed number of 
NUPTs with <1% p-distances in each plant genome 
(Supplementary Table S4). This might be partly because 
of the overestimation of the NUPT amount to be < 5 kb 
(or 1 kb). The number of shorter NUPTs could become 
greater by degradation than that of actually transferred 
fragments, although the effect might be limited. Rather, 
the difference between expected and observed NUPT 
numbers indicates high elimination pressure against the 
integration of NUPTs in all plant species. In the early evo- 
lutionary stage of endosymbiosis, the transfer of chloro- 
plast fragments, especially including genes, would 
contribute an essential role for the tight regulation of 
chloroplasts by the host plant. However, recently trans- 
ferred chloroplast genome fragments (young NUPTs) 
might have no clear functional roles and most of them 
were inactive. 67,68 The rarity of functionally important 
NUPTs indicates that they resemble non-functional junk 
DNA, such as TEs. The NUPT integrations into functional 
genes or gene regulatory regions could cause severe dele- 
terious effects. If NUPTs are non-functional and have 
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deleterious effect, they can survive only in the regions 
where functional genes are rare. The large amount of 



Table 3. Number of long-lived NUPTs and characteristics of their 
flanking regions 



Species 


Numberof 
long-lived NUPTs 


Number of repeat sequences 
within 5-kb flanking regions 


A. thaliana 


1 (D 


1/10 kb 


C. papaya 


1 (0) 


4/1 0 kb 


M, truncatula 


2(0) 


8/20 kb 


C. max 


1 (D 


6/1 0 kb 


R. communis 


1 (D 


5/1 0 kb 


P. trichocarpa 


1 (0) 


more than 1 0/1 0 kb 


S. lycopersicum 


10(10) 


more than 1 0/70 kb 


Z. mays 


2(2) 


more than 1 0/20 kb 



Numbers of NUPTs >5 kb in length and with a relatively low 
identity value (96-90%) are shown. 

In the parentheses, the numbers of long-lived NUPTs having 
other NUPTs within 5 kb surrounding regions are shown. 



long NUPTs close to the centromere region is reasonable 
since the elimination pressure would be less. 

In this study, we found that the age distribution of 
NUPTscould be explained by thecombinationofthe ex- 
ponentially reduced part and uniformly existing part 
(Fig. 7). The former would be deleterious fragments 
and can survive in the gene-poor and TE-rich regions 
where eliminations are not effective. The uniformly 
existing parts are rare and short in all plant species. 
These old and short NUPTs possibly have less deleteri- 
ous effect and have been maintained under neutrality. 
The different patterns of NUPT age distribution 
among plant species might reflect the amount of dele- 
terious NUPTs in each species, or different integration 
rates among species that account for the amount of re- 
cently formed NUPTs. The pattern of degradation and 
elimination of integrated NUPTs is important especially 
for understanding the dynamics of exponentially reduced 
parts. Michalovova et al. 66 suggested an important role of 
TEs on the mechanism of degradation and elimination 
of NUPTs. However, our results showed that there are 
many pervasive changes (small insertions and deletions 
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Figure 7. Schematic age distribution of NUPTs. Hypothetical age distributions of two different categories of NUPTs are shown. The left figure 
represents the distribution of deleterious insertions, whereas the right figure represents the distribution of insertions with no or few 
deleterious effects. The thickness of arrows indicates relative contribution of deleterious and neutral portion to two different categories 
of NUPTs distributions. 
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within or including NUPTs) found in all species, 
indicating the relative importance of such indel varia- 
tions for the degradation and elimination of NUPTs. 
Experimental studies using Tobacco suggest that rear- 
rangements by pervasive changes occurred just after 
integration. 20,67 Concomitant integration of multiple 
fragments could also form adjacent NUPTs from small 
disjunct parts of chloroplast genomes. Lloyd and 
Timmis 68 analysed the patterns of integration and 
changes of de novo insertion of chloroplast DNA frag- 
ments to demonstrate integration by non-homologous 
end joining involving simultaneous insertion of several 
chloroplast DNA fragments from different chloroplast 
regions. The result suggested concomitant integration 
of three DNA fragments from different regions of the 
chloroplast genome. Although the relative importance 
of each factorthat determine the pattern and dynamics 
of NUPTs is still unclear, both pervasive changes after 
integration and de novo concomitant integrations 
indicate the importance of early stage changes in the 
evolution of NUPTs. Detailed analysesontheelimination 
and degradation patterns of young NUPTs in different 
species might lead to the elucidation of the factor deter- 
mining the amount of NUPTs in a species. 
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